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feasibility of sampling for followup be studied; and that 
machine-readable records be tried for a sample of the data. Further 
testing of telephone followups and further research in the area of 
verification of content were also supported. Suggestions for 
alternate ways of obtaining housing data were made. Further study of 
the feasibility of a two-stage data collection, using the short form 
and long form, was encouraged. Further studies evaluating coverage 
were strongly recommended, while it was felt that the pretest of the 
post-enumeration study methodology was too ambitious. Continuation of 
the Forward Trace Study was supported. Other issues of reliability, 
ancillary data, statistical estimation, and operational constraints 
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PREFACE 



In its report to the Bureau of the Census, the American Statistical 
Association Technical Panel on the Census Undercount recommended "that the 
Bureau of the Census sponsor an outside technical advisory group on 
undercount estimation and related problems 11 (American Statistical 
Association, 1983:11)* Partly in response to that recommendation, the 
Census Bureau requested the Committee on National Statistics of the 
National Research Council to establish a panel: (1) to suggest research 
and experiments, (2) to recommend improved methods, and (3) to guide the 
Census Bureau on technical problems in appraising contending methods. 

The Panel on Decennial Census Methodology was charged with 
investigating three major issues from a technical viewpoint, setting aside 
legal considerations: 

(1) Adjustment of census counts and characteristics. This topic 
includes exploration of formal criteria to evaluate measures of 
undercount and alternative adjustment procedures, 

(2) Uses of sampling in the decennial census. This topic includes 
investigation of whether the sampling of lists and areas to 
improve coverage and sampling of nonrespondents for follow-up can 
improve accuracy for the total population and important subgroups 
at a given cost, 

(3) Uses of administrative records. This topic includes investigation 
of various types of records to determine their possible utility in 
improving the accuracy of census counts and the efficiency of 
census operations. 

The panel held its first meeting in January 1984 and met three times 
prior to preparation of this report. At the first meeting, we took a 
broad view of the charge and identified additional topic areas beyond 
those listed for possible investigation. For example , we decided that it 
was critical to examine uses of census data and the degree of accuracy in 
the census required to satisfy each use in order to reach sensible 
conclusions regarding a choice of methodology for the decennial census. 

The Census Bureau asked the panel to produce an interim report by June 
30, 1984, which was to focus on recommendations for improvements in census 
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methodology that warranted early investigation and testing.. The Census 
Bureau indioated that the panel's interim report, if completed by June 30, 
oould influence particulars of the design of the first 1990 census 
pretests scheduled for 1985 and the ohoioe of testing objectives and 
procedures for 1986 and beyond, 

In this interim report we have focused our efforts on three topio 
areas that are central to the original charge: (1) uses of sampling for 
the census count, (2) methodologies for evaluating completeness of 
coverage of the census, and (3) issues. related to the adjustment or 
modification of census counts and characteristics. In addition, we 
reviewed the Bureau's plans for the 1985 pretest of a two-stage 
methodology for conducting the census. 

This interim report offers recommendations and issues for 
consideration in each of the listed topio areas based on our review of the 
Census Bureau's research and testing plans. The panel intends to carry 
out further work on these topios and to tackle other areas not oovered or 
covered only briefly. We may, in the final report, have oooasion to 
modify some of the recommendations in this interim report. Nevertheless, 
we believe that the timeliness of these initial recommendations is 
critical. We are impressed by the need for the Census Bureau to make 
choices in its research and testing program and also by the limited number 
of testing opportunities that are available compared with the range of 
ideas that appear attractive to try out. Hence, we have striven to 
provide early guidance to the Census Bureau regarding what we believe, at 
this stage of our review, to represent the most promising avenues to 
pursue. 

John W. Pratt, Chair 
Panel on Decennial Census 
Methodology 
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1. INTRODUCTION 



The next decennial census of population and housing in the United States 
is scheduled to take place on April 1, 1990, Planning for this census, 
which will be the nation's twenty-first in an unbroken series since 1790, 
officially began last fall with an appropriation for fiscal 1984. Well 
before that date, substantial work of direct relevance for 1990 was 
conducted. The 1980 decennial program in fact included several 
experiments and postenumeration studies designed to help plan improvements 
in methodology for subsequent censuses. 

To the general public and many casual users of census data, it may 
appear that the Bureau of the Census has ample time to plan wisely for the 
1990 census. In fact, there are relatively few opportunities to 
thoroughly test changes or modifications to census procedures, 
particularly if the changes represent major departures from the past. 
Moreover, only tests conducted under census conditions, that is, 
experiments incorporated into the next census as distinct from pretests, 
can adequately assess the impact of alternative procedures on public 
cooperation with the census. 

The Census Bureau 1 s testing program for 1990 got under way this spring 
with tests of address compilation methods in several localities around the 
country (Bureau of the Census, 1984b). Two large-scale pretests are 
planned for spring 1985. Pretests will also be conducted in 1986 and 
1987. Finally, the research and testing program will culminate in 1988 in 
"dress rehearsals" of the procedures planned for 1990. 

This testing schedule means that the Census Bureau's only opportunities 
to try out new procedures and concepts for 1990 are the pretests scheduled 
for 1985, 1986, and 1987* The dress rehearsals, as the name implies, are 
not used to test new ideas but to run through the procedures the Census 
Bureau expects to follow in the decennial census itself. The only changes 
the Census Bureau anticipates from the dress rehearsals are corrections of 
problems encountered in the field, not innovations in census procedures at 
that late date. 

In addition to the compressed time schedule for testing and research, 
two other critical factors affect the ability of the Census Bureau to 
modify census methodology: staff and budget resources. The Census Bureau 
has long been known for the high qualit r and dedication of its technical 

1 
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ataff « The current budget for reaearoh on deoennlal oenaua methodology, 
particularly for reaearoh on the under oount, ia generoua by the atandarda 
of earlier censuses. Never theleaa, no agenoy of government, particularly 
in the oona trained world of the 1980a, can expeot to have sufficient ataff 
or reaouroea to try out more than a few promlaing ideas and concepts. It 
ia oritioal to deaigning the beat oenaua for 1990 that the Census Bureau 
ohoose prioritiea for the expenditure of reaouroea and ataff time wiaely 
and that it make the moat of the teating opportunltiea afforded over the 
next few year a . 

Why ia it ao important to ohooae wiaely among alternativea for teating 
and reaearoh for the 1990 oenaua? The decennial oenaua has been a aouroe 
of controversy throughout its history. Numerous inatanoea can be oited 
from the past of critioiam impugning the acouracy of oenaua figures and 
questioning the procedures and coata of oosduoting the oenaua (Bureau of 
the Census, 198< :App ,111b ; Conk, .1984), Yet it appears that aocial and 
politioal forces have oonverged in reoent yeara to make the oenaua in thia 
oountry — and in other oountriea aa well— a matter of greater oontroveray 
than before. 

On one hand, there is increased ooncera with the need to protect the 
privacy of individual oitizens and a sense that the publio is overaurveyed 
and less willing to respond to government inquiries. Indeed, in the last 
few years, the levol of publio suspicion and hostility to plana for the 
census caused the governments of several Western European oountries to 
delay their census programs or oanoel them entirely (see Redfern, 1983) . 

On the other hand, legislators have more and more frequently turned to 
statistics to handle tough polioy decisions. In fisoal 1981, federal 
grant-in-aid programs allocated well in exoess of $50 billion to states 
and local areas via formulas that depended in important ways on census 
figures (or statiatios based on census figures, suoh as current population 
estimates) to determine who got how many dollars (Emery et al., 1980; 
Gonzalez, 1980; Office of Management and Budget, 1983: Chap ,5), Census 
data are used by constitutional mandate to determine the number of seats 
in the U,S, House of Representatives that are allotted to each state. 
They are used as well in drawing up congressional and state and looal 
legislative district* to meet rigid oriteria for equitable representation 
of the population. In addition to these oritioal governmental needs, 
census data support many other major uses. Data from the latest oensus 
serve to document the sooial and eoonomio condition of the oountry as a 
whole and of small areas and groups in the population. Comparative 
information from successive oensusea serves to illuminate trends over 
time, Researohers, planners, and decision makers in business, government, 
and academio institutions make use of oensus data for a wide range of 
important planning and analysis purposes. All of these uses have 
underscored more than ever before the importance of obtaining a complete 
and accurate count of the population as well as acourete data about 
characteristics. 

Yet to obtain highly aoourate data costs money. The 1980 census cost 
close to $1,1 billion dollars*-- about $4,75 for eaoh inhabitant of the* 
United States (Bureau of the Census, 1983b: 88). The per capita amount is 
small compared with the per oase oost of most government and 
private- sector sample surveys. Moreover, the costs of the census include 
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planning, oolleotion, and processing activities that span most of a decade 
and provide data that are of value for the decade and beyond. 
Nonetheless, costs for data collection that are at the billion-dollar 
level excite comment and invite close scrutiny to determine how they might 
be reduoed. 

Moreover, research conducted by the Census Bureau itself has shown 
that, while the 1980 oensus appearr to have aohieved the most oomplete 
coverage in the country's history, ±sre still were inaccuracies • Host 
signifioantly, as in previous censuses, important raoe, sex, and age 
subgroups of the population experienced differential rates of net 
under cover age. There is strong evidenoe that the blaok population was 
undercounted by about 5 percent nationwide. Black males ages 25-54 appear 
to have had the highest net under count rates. Coverage estimates for 
whites and other races are diffioult to derive because of the lack of 
reliable estimates of net legal and illegal immigration. Making a range 
of reasonable assumptions about the size of the illegal alien population, 
it appears very likely that whites and other raoes experienced net 
undercount in the 1980 census ? but that the rate of under oount was smaller 
and perhaps significantly smaller than the 1.5 peroent rate experienced in 
1970 (see Pasoel et al., 1982:6-8). 

Differential undercount means possible inequities in redisricting and 
fund allocation based on oensus data. The belief that errors in the 
census affeoted fund allocation gave rise to an unprecedented number of 
lawsuits following the 1980 census. By October 1981, over 50 suits had 
been filed challenging the census results (Bureau of the Census, 
1983b: 85). Currently, testimony has just been completed in a major case 
in which the State and the City of New York are suing to have the Census 
Bureau adjust the 1980 oensus oounts; 23 other cases are awaiting 
settlement of the New York suit. 

Not surprisingly, many ideas have been proposed to improve the 
decennial census. Some ideas are directed principally at improving 
coverage and reducing differential ooverage errors. One idea in this 
class is to use administrative records, such as driver's lioense lists and 
other sources, to match against the census to identify people who should 
be added to the census count. The Census Bureau used this approach in a 
few large cities in 1980 (Bureau of the Census, no date-a). Other ideas 
are directed principally at reducing costs. One such approach is to make 
use of sampling, not only to obtain information on characteristics, as is 
currently standard decennial census practice, but also as part of the 
procedure to obtain the count. For example, one could attempt oontaot 
with a sample of households that do not mail back their questionnaires, 
rather than all nonrespondents, in the follow-up stage of census 
operations. 

Two important themes stand out in ourrent discussions of methodology 
for the deoennial census. One relates to the degree of emphasis that 
should be given to counting versus estimation. A census, no matter how 
diligently administered, can never be complete or without error. Henoe, 
a census, as is true of any survey, provides an estimate of the 
population. From this recognition has come a view of the decennial 
prooess, expressed most often by members of the statistical community, 
that emphasizes the role of estimation. Its proponents argue that some of 
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the resources for oonduoting the decennial oensus should be shifted from 
efforts directed toward traditional coverage improvement procedures to 
efforts directed toward developing the best possible estimates of the 
total population and subgroups* Input to the decennial year population 
estimates, in one version of this view (Erioksen and Kadane, 1983) , would 
inolude a reasonably well-oonduoted oensus, but also information obtained 
from various programs oonduoted on a sample basis, suoh as matching of 
administrative lists to oensus reoords, that would provide a basis for 
adjusting the oensus counts. There is by no means agreement within the 
statistical oommunity, let alone other disciplines, on the merits of the 
various suggestions put forward to incorporate estimation into the oensus 
prooess. Nevertheless, the known errors and the inoompleteness of the 
census count mean that the issue of adjusting oensus figures needs to be 
addressed. 

The other theme relates to the oritioal importance of evaluation 
programs in the methodology of the deoennial oensus. Politicians, polioy 
analysts, statisticians, economists, demographers, other sooial 
scientists, and users of oensus data in all seotors have expressed widely 
divergent views regarding the most appropriate methodology for oonduoting 
the census. But whether they view the census in traditional terms as 
strictly a counting operation or believe that the oensus should be the 
starting point for an estimation prooess, there is substantial agreement 
on the importance of evaluating the completeness and aoouracy of census 
statistics. 

The Census Bureau has oonduoted formal evaluation programs for every 
census since 1950 (Bureau of the Census, no date-a). All of the 
techniques used to date, in this oountry and abroad, inoluding demographio 
analysis, reverse record checks, administrative reoord matohes, and 
post enumeration surveys (whether recanvassing seleoted areas or matching 
independent surveys to census reoords), have important flaws. In the 
United States today, the absence of adequate data for estimating net 
immigration, whether of legal or illegal residents (Marks, 1980), poses 
particularly severe problems for evaluating the census count even at the 
national level. Nevertheless, with concern over possible inequities in 
political representation and the distribution of large amounts of federal 
dollars, there has never been a greater need for thorough evaluation of 
the deoennial census. This evaluation is necessary whether the objeot is 
to Inform users of known errors in the oensus or aotually to modify oensus 
results. 

While there is widespread agreement that evaluation is important and 
that the issue of adjustment must be faced, many deoisions on methodology 
for 1990 remain to be made. It is clear that there is no lack of ideas 
and suggestions that appear useful to investigate. It is also dear that 
the process of determining a reasonable methodology for 1990 will involve 
difficult choices. Thus, it is not possible to achieve both maximum 
accuracy and minimal cost; rather, as Keyfitz has noted (1979) » explicit 
cost-benefit trade-offs must be made. 

The Census Bureau is actively working on methodology for the 1990 
census and is seeking advioe and ideas from a wide range of groups and 
individuals representing many points of view. The Census Bureau has 
assembled a staff to plan the 1990 census and is recruiting a research 
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staff specifically to work on issues of underoount and the possible 
adjustment of oensus counts. The panel commends and hopes to aid these 
efforts to design and oarry out a thorough researoh and testing program 
that will support sound decisions regarding methodology for the 1990 and 
later oensuses. Resouroes invested .in careful research and testing 
represent the best possible investment for a oost-effeotive census in 
1990. 

The Census Bureau 9 s planning staff reoently prepared detailed researoh 
agendas on the following topics that correspond very olosely to the 
priority areas the Panel on Decennial Census Methodology was asked to 
address: 

— "Research Plan on Uses of Sampling in the Census Count" (Miskura et 
al. f 1984). 

— "Researoh Plan on Adjustment for the 1990 Decennial Census" (Hogan, 
1984). This dooument oovers researoh directed toward improved 
programs for evaluating oensus coverage as well as research in the 
area of adjustment of oensus counts per ae. 

— "Reoord Linkage Research Plan" (Jaro, 1984). 

— "Draft Researoh Plan on Uses of Administrative Records" (Harahush, 
1983). 

In the preparation of this interim report, the panel and staff, working 
through subgroups, examined the first two researoh plans listed above on 
the uses of sampling and coverage evaluation and adjustment. Panel 
members have not yet completed at this time, but intend to complete, a 
review of the researoh plans on administrative reoords and reoord 
linkage. The panel also reviewed the Census Bureau's plans, both in 
written form and through disoussions with staff, for the pretests planned 
for 1985. 

The remainder of this report provides the panel 9 s thinking to date and 
recommendations in the following areas: 

— Chapter 2: Uses of sampling for the oensus count. Based on the 
work of the panel* s subgroup on sampling, which reviewed the 
Miskura et al. research plan and related materials, the panel 
developed several recommendations regarding priorities for researoh 
and testing on uses of sampling in the decennial oensus. 

— Chapter 3: Early pretests. Panel members examined plans for the 
1985 pretest in Jersey City of a two-stage methodology that 
separates oolleotion of the sample (long- form) data from the basic 
count. The panel developed recommendations for ways to design this 
pretest to better measure the benefits and oosts of the two-stage 
procedure. The panel also developed recommendations related to 
other kinds of coverage improvement procedures that the panel 
believes deserve early testing. 

— Chapter 4: Coverage evaluation methodologies. The panel's 
subgroup on coverage evaluation reviewed relevant portions of the 
Hogan researoh plan and related materials. Based on its work, the 
panel developed recommendations regarding priorities for research 
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and testing of Improved methodologies for assessing the 
completeness of oensus ooverage. 
— Chapter 5: Adjustment techniques. The panel's subgroup on 

adjustment of oensus counts reviewed relevant portions of the Hogan 
research plan. The subgroup did not at this time suggest 
recommendations to the panel, but outlined a series of issues for 
the panel to address in our final report that must be considered in 
any decision to modify the 1990 oensus or subsequent oensuses for 
ooverage or oontent errors* 

The research plans drafted by the Census Bureau staff are extremely 
comprehensive and ambitious. The staff has dearly tried to include all 
reasonable ideas for consideration in their researoh and testing program. 
The thrust of the panel's oomments is to single out from these plans the 
priority areas for research and testing. The panel has also emphasized 
the cost-effectiveness of thorough analysis of the results of the 1980 
oensus and the various experiments and evaluation programs conducted for 
1980 and prior censuses. Throughout the panel has been guided by the 
belief that the Census Bureau must make the most of limited budget, staff, 
and testing opportunities. It is vitally important that the researoh 
program for the 1990 oensus be designed to provide a cumulative knowledge 
base and that the Census Bureau not attempt to try out so many ideas that 
pretest results cannot be effectively digested. The panel does not 
pretend to have the answers regarding the "best" methodology for the 
decennial oensus or even the "best" testing program. The panel has 
endeavored, at this stage of our work, to identify the ideas and oonoepts 
that appear most promising for early testing and research. 



ERLC 



17 



2. THE USES OF SAMPLING FOR OBTAINING THE DECENNIAL CENSUS COUNT 



Panel members reviewed the paper prepared by staff of the Bureau of the 
Census, "Research Plan on Uses of Sampling in the Census Count n (Miskura et 
al«, 1984) | and other relevant materials. The Miskura et al. paper 
describes four applications of sampling for the decennial census and 
proposes research projects for each type of use: (1) obtaining the census 
count on a sample basis, (2) using sampling for follow-up of unit 
nonresponse in the census, (3) using sampling for verification and possible 
correction of specific subject items during the census, and (4) using 
sampling for coverage improvement operations. We present and discuss the 
recommendations of the panel for each of these areas in turn below. 



TAKING A SAMPLE CENSUS 

Currently, decennial census methodology involves collecting the majority of 
population and housing characteristics from only a sample of households, who 
receive the "long-form" census questionnaire. (Sample sizes for the 
long- form items in recent censuses have ranged from 3*3 to 50 percent and 
are typically 20 or 25 percent.) However, the counts of persons and housing 
units as well as basic characteristics, such as age, race, sex, and marital 
status of the population and tenure and number of rooms for housing, are 
attempted on a complete count or 100 percent basis. 

The concept of taking a "sample census," i.e., taking a large sample 
survey instead of a full census to obtain the count of the population and 
related basic characteristics, has been suggested as a means to effect a 
significant reduction in costs while still satisfying the primary 
information needs served by a full census (see, for reference, Bureau of the 
Census, 1982a; Kish, 1979). 

Miskura et al. propose several research projects intended to result in 
a possible design for a sample census. The first project, which is planned 
for the period from June through September 1984, is to develop appropriate 
sampling error estimates for alternative designs for a sample census. The 
second project, scheduled for the period October 1984 through March 1985, is 
to develop total error models (including sampling and nonsampling error) for 
the sample designs investigated in the first project. The second project 
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would investigate the theoretical reduction in nonsampling error required to 
obtain overall accuracy at least equal to that of a complete count. The 
Census Bureau staff would then develop cost models and estimate cost model 
parameters for a sample census* Based on the results of the research, the 
staff would specify a sample census methodology to be tested initially in 
1986. 



Problems Involved in a Sample Census 

The panel believes that the concept of replacing the census with a large 
sample survey should be given a low priority in the Census Bureau's 1990 
research and testing program for a number of reasons that relate principally 
to census purposes, costs, and coverage. 

With regard to purposes, the decennial census is the only comprehensive 
source of data for very small geographic areas such as towns, census tracts, 
and city blocks. There are important needs for data about small areas, 
including: redistricting of congressional, state, and local legislative 
districts, which requires block counts by age and race to meet 
court-mandated criteria for population equality and compactness of districts 
(Bureau of the Census, no date-b); revenue sharing, which requires 
population and income data for 39,000 political jurisdictions that include 
many very small towns, villages, and special districts; and many other 
important policy planning and analysis purposes at the state and local 
level. Moreover, the model-based estimation techniques that are used to 
produce small-area data intercensally for revenue sharing and other purposes 
must be evaluated and recalibrated periodically against the census. 

To obtain small-area population counts and basic characteristics from a 
sample survey to satisfy the uses outlined above would require a large 
sampling rate, perhaps as high as 50 percent for small jurisdictions; 
otherwise, sampling errors would be unacceptably large. Moreover, it would 
not be feasible to design a clustered area sample that sampled the 
population of only some geographic areas such as selected counties or 
cities, because small-area data are needed for every geographic entity of 
the country. Yet to select a large unclustered sample would probably 
require an attempt to list all housing units. Given these factors, namely 
a large sampling rate, 100 percent address listing, and an unclustered 
design, the panel is doubtful that costs could be significantly reduced, if 
at all, in comparison with a full census. 

Substantial cost savings from sample surveys occur when administrative 
overhead costs can be reduced by eliminating entire segments of field 
operations. Such reductions can be achieved using a clustered design, but 
a design that requires sampling in every county and city would necessitate 
the same number of field offices as is required for a full census. 
Moreover, while the size of the interviewer staff could be reduced somewhat, 
a large sample survey would entail additional costs for drawing and 
controlling the sample. 

Finally, there is the issue of completeness of coverage obtained by a 
large sample survey compared with the full census. There is a large body of 
evidence in both the United States and other countries that the census 
obtains more complete population coverage than even the best-executed sample 
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survey (Redfern, 1983; Yuskavage et al., 1977)* In fact, even the samples 
taken in conjunction with the census generally produce lower population 
figures than the complete census (Waksberg et al,, 1973)* One possible 
reason for this finding is that the publicity surrounding a census elicits 
greater cooperation from the public than can be obtained in surveys. While, 
of course, the Census Bureau would mount a publicity campaign for a sample 
census, it would be difficult to include a question like "Were you counted?" 
when only a fraction of the population is supposed to respond. Similarly, 
the field operations of a census are geared toward finding every housing 
unit and person and adding missed units to the address list developed in 
advance of the census. For a sample census, it is unlikely that the same 
effort would or could be put into adding units to the sampling frame, with 
the result of less complete coverage. 

The less complete coverage obtained by a sample census compared with 
current methodology would have important adverse implications for many 
important uses of census data. Concerns about inequities resulting from 
undercoverage and particularly differential undercoverage of important 
subgroups of the population are already very strong, Substituting a large 
sample survey for the census would deepen these concerns still further. The 
decennial census is also used as the basis for the design of current surveys 
in both the public and private sectors and to benchmark current population 
estimates. Less complete coverage would adversely affect these uses of 
census information, 

Renftmmftndation 2.1 . We recommend that for 1990 the Census Bureau put 
low priority on research and testing directed toward taking a sample 
survey instead of a census for the count and basic characteristics, 



Estimating the Costs of a Sample Census 

While we have expressed strong doubts about the utility of a sample census, 
we believe that it could be useful to obtain rough cost estimates if this 
estimation could be accomplished with a modest amount of effort. The 
methodology of a sample census stands at one extreme on a continuum for 
which the other extreme is a census that asks all questions on a complete 
count basis. It would be useful to be able to make approximate comparisons 
#f costs at various points on the continuum, including the extremes. It may 
be that estimates prepared in the 1970s for conducting a mid-decade census 
on a sample basis would provide a ready base for estimating the costs of a 
sample census in 1990, The panel intends to explore with the Census Bureau 
ways of obtaining relevant information for costing out a sample census using 
an unclustered design and assuming three or four alternative sampling rates. 



THE USE OF SAMPLING FOR FOLLOW-UP 

On the assumption that the next census will make at least one attempt to 
count everyone in the population, i.e., that the census will not be replaced 
entirely by a sample survey, the idea has been put forward that perhaps 
sampling could be used in the follow-up stage of census operations as a 
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means of reducing costs (Bureau of the Census, 1982a, 1983a; Erickaen and 
Kadane, 1983; General Accounting Office, 1982). A census carried out with 
the use of sampling for follow-up could, for example, at a specified date 
after Census Day, draw a sample of addresses from which a completed census 
form had not been returned and follow up only those addresses. The total 
number of housing units and persons represented by the cases that were 
followed up would then be estimated and added to the number 
that sent in their questionnaires. The Miskura et al. paper outlines 
research projects intended to provide a sound methodological basis for 
designing follow-up operations to be carried out for a sample of 
nonresponding units. These projects are similar to those proposed in 
connection with conducting the entire census on a sample basis, namely to 
develop sampling error estimates and total error models for alternative 
sampling designs, except that the focus is on sampling in the follow-up 
stage of census operations. Again, these research endeavors would lead to 
a pretest of sampling for follow-up in 1986. 



Problems Involved in Sampling for Follow-Up 

The panel believes that the use of sampling for follow-up has some of the 
same drawbacks as the concept of replacing the census entirely with a large 
sample survey, although the problems are on a smaller scale. Specifically, 
we believe it is unlikely that significant cost savings could be achieved by 
follow-up for unit nonresponse on a sample basis compared with a 100 percent 
effort. Because a greatly clustered design could not be used, given that 
follow-up operations must be carried out in every geographic area, there 
would be no opportunity to effect sizable savings by eliminating entire 
segments of field operations. Moreover, there would be the added costs of 
drawing and controlling the sample. The possibilities of confusion caused 
by a large sampling operation concurrent with the census should not be 
underestimated. Mail returns may well come in, for example, after the 
cutoff date for drawing the follow-up sample with consequent practical 
problems for determining whether and how to integrate late returns with the 
sample. Carrying out follow-up operations on a sample basis would also pose 
problems for coverage improvement and coverage evaluation programs that 
involved matching individual records. 

Sampling for follow-up would introduce sampling errors that might be 
unacceptably large for small areas. Moreover, careful attention would need 
to be given to the sample design and determination of sampling fractions, 
given the likelihood of large variations in initial mail response rates 
across geographic areas. For example, in 1980, Madison, Wisconsin, had a 
©ail return rate of over 90 percent, while the rate for the central Brooklyn 
district office was no more than 55 percent (Ferrari and Bailey, 1983:59). 



Sampling in the Final Stages of Follow-Up 

There may be reason to believe that sampling for follow-up could prove 

cost-effective in the very final stages of follow-up operations. It has 

been estimated that the costs to count an additional person rise sharply as 
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one moves toward those people who are harder to locate. That is, the per 
case costs to enumerate people requiring multiple follow-ups or special 
coverage efforts are many times the per case costs for those persons who 
mail back their questionnaires (Keyfitz, 1979 J National Research Council, 
1978). Hence, the benefits of sampling in the final stages of follow-up 
might well outweigh the drawbacks. 



The Merits of Research on Sampling 

On balance, we believe sampling for follow-up in the census context presents 
serious problems. Nonetheless, we believe that it would be useful for the 
Census Bureau to carry out research designed to provide a body of evidence 
for a decision whether to field test use of sampling in census follow-up 
operations or to drop the idea. We suggest as a first step that the Census 
Bureau analyze data from the 1980 census and also from early pretests to 
simulate sampling under different mail response rate scenarios. (Our 
proposal is in basic agreement with the research plan outlined in Miskura et 
al.) The analysis should attempt to identify stages of follow-up (first 
round, second round, etc.) and, for each stage, determine cost structures 
and response patterns to assess the possible cost-effectiveness of 
sampling. The analysis should also examine cost functions and calculate 
sampling error and expected contribution to total error for different sized 
geographic areas and areas differing in mail response rates. 

Recommendation 2.2. We recommend that the Census Bureau analyze 1980 
census and early pretest results to simulate sampling procedures and 
develop cost and error structures under varying assumptions regarding 
mail response rates. The analysis should attempt to identify stages of 
follow-up (e.g., first round, second round) and, for each stage, assess 
the possible cost-effectiveness of sampling. We also recommend that the 
Census Bureau assess the logistic feasibility of sampling for follow-up, 
perhaps through expert group discussions prior to engaging in field 
tests. 

The recommended analysis would be particularly useful if data were 
available on the follow-up status of individual households for a sample of 
enumeration districts; that is, differentiating households that returned 
their questionnaires without prodding from those that required one, two, 
three, or four follow-ups (the maximum prescribed in 1980). Unfortunately, 
these data were not captured in machine-readable form in 1980. 

Renommftndation 2.3. We recommend that the Census Bureau keep 
machine-readable records on the follow-up status of households for a 
sample of areas in the upcoming pretests and in the 1990 census, so that 
information for detailed analysis of the cost and error structures of 
conducting census follow-up operations on a sample basis will be 
available. 
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Telephone Folio w-Up 

With regard to census follow-up operations, finally, we noted with Interest 
the report on the telephone follow-up experiment conducted during the 1980 
census (Ferrari and Bailey, 1983). For this experiment, in seven district 
offices a sample of units in the address registers that were not in 
multiunit structures and had not sent back questionnaires by mid-April was 
selected for telephone follow-up using telephone directories organized by 
address. (In one district office, a sample of units in multiunit structures 
was also drawn.) The other nonresponding units in these offices were 
followed up by enumerators according to standard census practice* 
Preliminary results indicated several advantages for the telephone 
technique, namely lower costs per completed interview compared with personal 
follow-up, lower item nonresponse rates for many items, and fewer duplicate 
questionnaires. Refusal rates were similar for both techniques. A 
disadvantage of telephone follow-up was that the directories lacked listings 
or had out-of-date listings for many addresses. 

The report of the experiment, in addition to documenting results, 
describes in some detail operational problems that were encountered in 
administering the experiment. For example, a higher than expected rate of 
return of mail questionnaires after the sample selection date reduced the 
actual sample size of the telephone follow-up samples. The regular field 
office staff and the experiment staff also had problems working smoothly 
together in some offices. 

Recommendation We recommend that the Census Bureau conduct further 

testing of telephone follow-up for unit nonresponse. We also recommend 
that the Census Bureau review the operational difficulties encountered 
in the 1980 telephone experiment for their relevance to sampling for 
follow-up. 



SAMPLING FOR VERIFICATION 

Miskura et al. distinguish between the use of sampling for verification and 
the use of sampling for coverage improvement. They define the former as a 
sampling operation involving reinterview of units to determine the accuracy 
of the information obtained — in other words, content evaluation. The latter 
refers to special coverage improvement programs designed to add units to the 
census count, such as cross-checking with administrative lists, carried out 
on a sample basis. We note that coverage improvement programs should be 
concerned not only with adding units and persons, but also with ensuring 
that persons are not counted more than once in the census. 

With regard to content evaluation, the Census Bureau traditionally has 
evaluated the quality of reporting in the decennial census through sample 
surveys reinterviewing census respondents after Census Day. Other means, 
such as matching to administrative records, have also been used for content 
evaluation. To date, virtually all content evaluations have been carried 
out on a postcensus basis (Bureau of the Census, 1978b; Miskura and 
Thompson, 1983). The results have been used to improve questionnaire design 
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in subsequent censuses and in other ways, but have not been used to alter 
responses to the census itself. 

The only exception known to the panel occurred in 1970 when a 
verification operation, the National Vacancy Survey, was carried out on a 
sample basis and used to adjust the census results. The National Vacancy 
Survey rechecked the occupancy status of 15,000 housing units originally 
classified as vacant in the census. On the basis of the results, 11 percent 
of vacant year-round housing units were reclassified as occupied. Persons 
were imputed for these units, totalling about 0.5 percent of the population 
in 1970, as were all the housing and person characteristics (Bureau of the 
Census, 1976:8-30). 

The Miskura et al. paper discusses the application of sampling for 
verification during census operations and proposes several research projects 
in this area. 0ns project would consider sample design issues for each 
potential use of sampling, such as the development of a sampling frame, the 
choice of sample unit, selection procedures, and possible stratification. 
Estimates of variances associated with particular designs and total error 
models would also be developed. One particular problem this research would 
address concerns possible complications stemming from the use of two or more 
sampling procedures for overlapping frames. Another proposed research 
project would cover work on selection and data collection methodologies. A 
third proposed project would focus on estimation techniques to incorporate 
the results from various verification procedures into the published census 
data. Miskura et al* limit the application of sampling for verification to 
procedures that involve reinterviewing census respondents. 



The Importance of Verifying Content 

The concern over completeness of population coverage in the census can 
obscure equally valid concerns over the accuracy of the content. Analysis 
of the fund allocation formula for general revenue sharing, for example, has 
shown that the per capita income component of the formula is more important 
than the population component in determining the distribution of funds among 
jurisdictions (Robinson and Siegel, 1979; Siegel, 1975). Yet reports of 
income in the census, as in household surveys, are known to be subject to 
large errors (Bureau of the Census, 1970, 1973, 1975b). These facts suggest 
that coverage problems should not monopolize resources that could be 
usefully directed to improving the accuracy of content. 

Evaluation research has documented problems in the reporting of many 
other items in the census besides income. The panel believes that serious 
attention should be directed to research that might lead to verification of 
selected content items that have important policy uses as part of the census 
operation itself, instead of waiting until after the census is completed. 
As a corollary, we believe research should be directed to the issue of 
possibly adjusting census reports on the basis of the outcome of 
verification operations. Obviously, not all items can or should be subject 
to this kind of verification. For items designated for verification, it 
seems clear that sampling is necessary to make the process manageable in the 
field and to keep costs within reasonable bounds. 
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Because verification and adjustment of census reports have rarely been 
used as elements of decennial census methodology, it would be prudent for 
the Census Bureau to set forth and follow a step-by-step research and 
testing program. Extensive research should be concentrated on a few key 
items. 

Recommendation 2.5 . We recommend that the Census Bureau give high 
priority to research and testing in the area of verification of content 
on a sample basis during the census. We recommend further that the 
verification procedures examined not be limited to reinterviews but 
should include the use of administrative records as well. 



Verification of Housing Items 

In considering the issue of sampling for verification, the panel looked most 
closely at questions on structural characteristics of housing units, 
particularly the item on age of the structure or year when the structure was 
built. (Time constraints precluded examining other important items as 
well.) Age of structure is an important component of one of the two fund 
allocation formulas for the Community Development Block Grant Program. The 
intent of this formula is to direct funds to older, declining cities in 
which the housing stock includes a disproportionate share built prior to 
1940 (Gonzalez, 1980). Reporting of this item in the census has observable 
problems (Bureau of the Census, 1972, 1975a; Katzoff and Smith, 1983). The 
nonresponse rate is fairly high, as is the index of inconsistency (a measure 
of the difference between census reports and reports obtained in 
reinterviews for a sample of census respondents). It has been observed 
that, in some cities, the proportion of housing reported as being built 
before 1940 has been on the increase rather than decreasing, as one would 
expect in most circumstances. 

It is not surprising that this item should be poorly reported. People 
who rent their living quarters, particularly if they recently moved into the 
unit, would be unlikely to have accurate information regarding the age of 
the structure. Even homeowners may be uncertain about when their homes were 
built. On one hand, it would seem that buildings housing several families, 
such as apartments or condominiums, will be those for which response errors 
are largest. On the other hand, this information is likely to be available 
in many jurisdictions with far better reporting from administrative sources 
such as assessment and tax records. A specific suggestion for how the item 
on age of structure could be verified on a sample basis using administrative 
records as part of census operations is outlined at the end of this 
section. 

Sampling for verification during the census is not the only means for 
improving the quality of census reports that should be considered. 
Continuing with our example of structural items for housing units, it is 
possible that more accurate information could be obtained by directing 
questions on these items to respondents believed to be more knowledgeable 
than the occupant of the unit. We understand that the Census Bureau is 
considering testing questionnaires that would ask owners or managers of 
apartment buildings the items on the structure, such as year built, number 
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of units, oondominium/cooperative status, heating equipment, fuels used, 
source of water, etc. This method is used in the censuses of several 
European countries at present (Redfern, 1983)* We believe that it is 
worthwhile to explore this approach. 

Another approach to consider for some housing items is to obtain them 
instead from administrative records and drop them from the census. If the 
prinrory use for age of structure, for example, is. as input to the Community 
Development Block Grant formula, and cross-tabulation of this item with 
other census items is of low priority for users, then a cost-effective 
approach would be to devote resources to gaining access to and improving 
administrative records for the date of construction and eliminate this item 
from the census questionnaire. 

Clearly, there will be many problems in using administrative records to 
obtain housing structure items. Records are kept in many different ways and 
vary in quality and accessibility among jurisdictions. For example, records 
such as tax assessor's rolls are highly computerized in some jurisdictions, 
while maintained in paper form in other areas. The number and types of 
characteristics recorded for each property also vary (see Bureau of the 
Census, 1984a). Nonetheless, investment in restearch and testing of the use 
of administrative records for housing structure! items offers the potential 
to improve the accuracy of the data and reduce respondent burden in the 
census. Research in this area, to be most beneficial, should investigate 
the use of administrative records in jurisdictions that differ in the nature 
and quality of the relevant record systems. 

Recommendation 2.6 . We recommend that the Census Bureau investigate the 
cost and feasibility of alternative ways to obtain more accurate data on 
housing items. Possibilities include: (1) obtaining housing structure 
information on a sample basis from administrative records and using this 
information to verify and possibly adjust responses in the census; (2) 
obtaining structure information solely from administrative records and 
dropping these items from the census; and (3) asking structure questions 
of a knowledgeable respondent such as the owner or resident manager. 



A Specific Suggestion for Verifying Age of Housing Units 

The panel offers the following scheme as a suggestion for obtaining more 
reliable data on age of structure and perhaps related housing items. The 
basic concept is to develop a sample of structures from the address lists 
compiled for the census and to obtain data from local administrative records 
about the characteristics of the structured in the sample. It may prove 
most feasible to carry out this scheme in urban areas where census address 
listings and identifiers carried on local administrative records can most 
readily be matched. 

Prior to the census, a reasonably complete list of housing unit 
addresses is constructed. Units that have the same basic address (such as 
Apt. A and Apt. B at the same street number) can initially be considered to 
be part of the same structure. Hence, it is possible to draw a sample of 
basic addresses that is a good proxy for a sample of structures. 
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The precise design and size of the sample would depend oil the nature of 
the costs among other considerations. We outline one possible procedure. 
Assume that the sample of basic addresses or structures is drawn with the 
probability of selection proportional to the estimated number of units in 
the structure. For concreteness, assume that single-unit buildings are 
sampled at a rate of 1 in 10, duplexes are sampled at a rate of 2 in 10, and 
so forth, up to structures with 10 or more housing units that are sampled 
with certainty. Administrative records data for age of structure and 
perhaps other items would then be obtained for the structures in the sample. 

The sample of basic addresses or structures can be linked to the sample 
of housing units in the census as follows. Assume that one-fifth of the 
households are to receive the census long form, which asks for age of 
structure and related housing items. Given that the sample of basic 
addresses is specified at the time of the mailing of the census forms, all 
of the long-form households could be selected from those addresses. 
Specifically, one scheme would be to send long forms to: all single housing 
unit structures that are in the sample of basic addresses, two households in 
all other selected structures with less than 10 units, and one-fifth of the 
households in all structures with 10 or more units. Recalling the sampling 
rates for different sized structures, this will achieve a one-fifth 
long-form sample for structures with more than one unit. To achieve a 
one-fifth long-form sample of single-unit buildings, it will also be 
necessary to send long forms to single-unit structures not in the sample of 
basic addresses. This sampling scheme has the drawback of increasing 
sampling variance for the long form due to the clustered design. However, 
it has the great advantage that all of the long-form sample for people 
living in structures with two or more housing units are included in the 
sample of basic addresses. Hence, data collected from administrative 
records for these structures are available to verify or possibly take the 
place of responses to the census. 

Two options are available with respect to the question on age of 
structure in the census. It could be asked on the census form or it could 
be omitted. Assume that the question is retained on the census form. The 
simplest processing method would be to use the value obtained from 
administrative records for all individuals residing in the structures that 
are in the sample of basic addresses and to retain the answers of 
individuals not in the sampled structures. It would also be possible to use 
regression-type procedures to modify responses of individuals in structures 
that are not in the sample based on the information obtained for the sampled 
structures. 

Now assume the question is not included on the census form. The values 
obtained from administrative records could simply be appended to the census 
data records for persons in structures that are in the sample of basic 
addresses. For persons not in sampled structures, it would be possible to 
assign values obtained from sampled structures located in the same area. 
This should be a very effective procedure in areas in which large groups of 
units, such as apartment complexes or suburban housing developments, were 
constructed at the same point in time. 
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THE USE OF SAMPLING FOR COVERAGE IMPROVEMENT 

The research plan geared toward developing sample-based coverage improvement 
programs in Miskur* et al. includes three projects that are similar to those 
proposed for sampling for verification: a project to work on sample design 
issues, a project to investigate selection and data collection 
methodologies, and a project fco conduct research on estimation from the 
results of coverage improvement sampling operations, A fourth project is 
proposed to conduct research directed at translating the findings from the 
estimation research into required additions to the census, for example, 
imputation procedures to add "persons" corresponding to the estimated 
under count. 

Sampling for coverage improvement has similarities both to sampling for 
follow-up and sampling for verification. Certainly, carrying out specific 
coverage improvement operations on a sample basis has the potential to 
reduce costs and speed the completion of the census, as may the use of 
sampling in the final stages of follow-up. As with sampling for 
verification, sampling for coverage improvement is directed toward improving 
the accuracy of the decennial census without the expense of a 100 percent 
effort. On the negative side, there are problems of estimation raised by 
carrying out coverage improvement programs on a sample basis. The panel has 
not as yet considered the uses of sampling for coverage improvement in any 
detail and hence does not offer recommendations at this time. 



9 

ERLC 



28 



3. EARLY PRETEST PLANS FOR 1990: REVIEW AND COMMENT 



For the first pretests, in spring 1985, leading up to 1990, the Census 
Bureau proposes to test various automated procedures to improve census 
operations in Tampa, Florida, and to conduct a test of a two-stage census 
operation in Jersey City, New Jersey (Bureau of the Census, 1984b). 
Although panel members have not scrutinized plans for the Tampa pretest, the 
panel supports efforts by the Census Bureau to develop improved automated 
procedures that have the potential to speed up data collection, improve 
accuracy, and reduce costs. The panel also supports efforts to automate 
matching operations that may be used in coverage evaluation and coverage 
improvement programs. 

The panel focused most of its attention on the two-stage pretest, since 
this test is related to the charge of the panel to investigate the uses of 
sampling in the census. The panel also developed recommendations related to 
coverage improvement and questionnaire design, which we believe deserve 
early pretesting. 



THE TWO-STAGE PRETEST 

The concept has been put forward in the Congress and elsewhere that census 
operations, particularly in hard-to-enumerate areas, would be improved if 
the collection of the "long-form" information were completely divorced from 
collection of the "short-form" information. In the last two censuses in 
most parts of the eountry, questionnaires were mailed out to all 
households. About 80 percent of the households received a short form that 
contained basic population and housing items, while the remaining households 
received a long form that contained the same basic items as the short form 
plus a larger number of items asked only of the households in the long-form 
sample. The basic proposal of a two-stage census is to mail out the short 
form to all households in the first stage and then, some weeks later, make 
a second mailing of the long form to a sample of households. (The 1960 
census employed a two-stage operation in the mailout/mailback areas. In the 
first stage, the short form was sent to all households, who were asked to 
hold the form for pickup by enumerators a few days later. In the second 
stage, enumerators at the time of picking up the short form left a long form 
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at every fourth address to be filled in and mailed back, see Bureau of the 
Census, 1966.) 



Pros and Cons of the Two-Stage Approach 

It has been proposed that the type of two-stage process to be tested in 
Jersey City might have two advantages over the one-stage operation used in 
1970 and 1980: it would reduce the time required to complete the basic 
count of the population in the first stage and it would obtain more complete 
coverage of the population. 

The reasoning is that households will be more willing to respond if they 
receive the short form and that the census field staff will be able to more 
expeditiously and thoroughly complete the count if they are not distracted 
by having to follow up for responses to long- form questions (see, for 
reference, Bounpane, 1984). 

With regard to the first point, there is evidence that mail response 
rates are somewhat but not appreciably higher for short-form recipients than 
for long-form recipients. Overall, the mail return rate in 1980 for short 
forms was about 2 percentage points higher than the rate for long forms. In 
centralized district offices, which were responsible for central cities 
containing hard-to-count areas, the difference was over 7 percentage points 
(Fansler et al., 1981). The 1970 census experienced similar mail return 
rate patterns. 

Two possible disadvantages of the two-stage procedure are higher costs 
and poorer quality of the long-form information collected in the second 
stage. 

The experience in 1980 suggests that the increase in mail return rates 
that might be achieved in the first stage of a two-stage census compared 
with a one-stage operation will not be great enough to produce significant 
reductions in follow-up costs for the first stage. Moreover, nonresponse 
rates to the long form may be substantially higher in a two-stage census, 
because households in the long-form sample resent being asked a second time 
for information or believe that they have already furnished all of the 
information requested in the census. Consequently, there will be higher 
follow-up costs to obtain the long->form information and perhaps adverse 
effects on the quality of the information as well. It is likely that many 
first-stage nonrespondents will also be second-stage nonrespondents, and 
total follow-up costs for these households will be roughly doubled. Hence, 
the panel believes that the total costs of the two-stage approach are likely 
to exceed the costs of a one-stage operation. 

Overall, the panel doubts the utility of the two-stage approach to 
census enumeration. Benefits in terms of improved coverage and timeliness 
of the basic count appear unlikely to outweigh added costs and problems in 
obtaining the long-form information. 
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Research on the Two-Stage Approach 

If additional information is required about the advantages and disadvantages 
of the two-stage approach, we recommend conducting research rather than 
field testing* Conducting a pretest of the two-stage approach in 1985 will 
be expensive and, as discussed below, may prove inconclusive. A 
cost-effective means to obtaining relevant information would be to 
intensively reanalyze the short-form and long-form records from the 1980 
census. Analysis of indicators by race and type of place, such as mail 
return rates, vacancy rates, reported household size, and number and extent 
of imputations for item nonresponse to the short-form questions, should 
provide useful information about the relative impacts of the two forms on 
the basic count. Similarly, it would be useful to review the experience in 
the 1960 census with a two-stage procedure. 

Recommendation 3.1. We recommend that the Census Bureau analyze the 
short-form and long-form records from the 1980 census to obtain 
information that would be useful in assessing the likely effects on the 
basic count of collecting the long- form items in a separate phase. We 
also recommend that the Census Bureau review the experience in the 1960 
census with a two-stage procedure. 

The Alternative Questionnaires Experiment conducted in conjunction with 
the 1980 census included some aspects of the type of analysis recommended 
above (Fansler et al., 1981; Mockovak, 1982a, 1982b, 1983). We believe 
additional analysis focused explicitly on long-form versus short-form issues 
would be worthwhile. 



Design of the Two-Stage Pretest 

Should the Census Bureau decide to go forward with a field test of the 
two-stage approach in 1985, we believe that the test should be carefully 
designed to maximize the ability to detect important differences between the 
two-stage and the one-stage procedures. The current design of the two-stage 
pretest endeavors to replicate the likely census procedures as much as 
possible, even when they do not appear relevant to the objectives of the 
test. For example, as currently proposed, half of the 100,000 housing units 
in Jersey City will be enumerated as in 1980, with mailout of the short-form 
questionnaire to 80 percent of the units and the long form to the other 20 
percent in the usual one-stage operation. The other half of housing units 
will be enumerated via the two-stage procedure. Again, 80 percent will 
receive the short form only and will not receive a second mailing, while 20 
percent will receive the short form in the first mailing and a long form 
about a month later. The long form will reask the short-form questions for 
most of the 20 percent sample and will reask only the household roster for 
the rest (see Matchett, 1984). 

We believe this design is not well calculated to provide the best 
evidence about the comparative advantages and disadvantages of the two-stage 
and the one-stage procedures. The panel believes the sample sizes for the 
two halves of the experiment are too small to conclusively demonstrate 
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coverage differences. We suggest an alternative design: half the housing 
units are enumerated in a single-stage procedure; the long form is mailed to 
every one of these units. The other half is enumerated in a two-stage 
procedure; the long form is sent to every unit in the second stage. This 
design would maximize the ability to detect differences in response rates 
and coverage as they affect the long-form households, although of course it 
would not permit making all important comparisons between the two 
procedures. On one hand 9 if receipt of the long form in hard-to-enumerate 
areas significantly affects response rates and reduces coverage, this design 
has a better chance to detect such differences. On the other hand, the 
sample sizes are sufficiently large that, if the differences in response 
rates and coverage are not significant for the two approaches using this 
design, the test can be considered conclusive. 

We are also concerned that the currently proposed design will produce 
unnecessarily large adverse effects on the second stage response and on the 
quality of the resulting long- form information. Specifically, we believe 
that requiring long-form recipients to repeat their answers to the short 
form questions is likely to discourage cooperation. There is also the 
problem with this approach that the long-form sample will not provide the 
same snapshot of the population as the complete count, because some people 
in the sampled households will be new residents who moved in after the first 
stage. 

If respondents are sent the long-form questions only, with one or two 
identification questions such as the household roster to permit matching 
their long-form information with their short-form replies, cooperation 
should be improved. However, this latter approach has the costs of matching 
and the introduction of matching errors. With this procedure, moreover, 
there will be some sample loss because of people who move between the two 
stages. 

A potentially useful variant to incorporate in the two-stage pretest 
would be to designate part of the two-stage sample to receive second-stage 
questionnaires that include their short-form answers. This procedure has 
the potential to elicit greater cooperation because households in the 
long-form sample will appreciate that they are not being asked to supply the 
same answers twice. We should note, in this regard, that the two-stage 
procedure used in the 1960 census required households in the long- form 
sample to fill out the short-form questions again. A difference in the 1960 
procedure from the planned two-stage pretest is that, in 1960, enumerators 
personally dropped off the long forms at the same time that they picked up 
the short forms and hence could explain the procedure to the long-form 
households. 

However, there are two kinds of problems in implementing a test of 
returning short-form answers to households in the long-form sample, at least 
one of which may have overriding importance. The first problem is 
operational, namely that it may be difficult, particularly for the first 
pretest in 1985, for the Census Bureau to develop an efficient means to 
transcribe short-form answers on the long- form questionnaires. For testing 
purposes, it should be possible, at a minimum, for the Census Bureau simply 
to reproduce the sample households 1 short-form questionnaires and attach 
them to the long form. 
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The second problem ia more serious and relatea to potential disclosure 
of confidential census information. One way to minimize the possibility of 
disclosure would be to mail out the long forms containing the transcribed 
short-form information via first class mail to householders by name with a 
request that the envelope be forwarded if the addressee has moved. Even 
this procedure is likely to result in disclosure in a small proportion of 
cases, because someone other than the addressee opened the envelope, 
short-form information for the wrong household was sent out, and so on. We 
recognize, moreover, that just because disclosure problems did not occur in 
a pretest is no guarantee that disclosure would not occur during a census, 
which is a much larger and more difficult operation to control. 

Nevertheless, to test the panel's hypothesis that cooperation in the 
second stage will be higher if respondents do not have to repeat their 
short-form answers, it would be desirable to find a means of returning 
short-form replies to the households in the long-form sample. Early 
investigation is required into methods for readily returning the short- form 
answers to respondents while maintaining the absolute confidentiality of 
census returns. 

In summary, we doubt the utility of the two-stage census procedure. 
Nevertheless, if scarce testing funds are to be used for a two-stage 
approach, we believe every effort should be made to design the test so 
that: (1) differences between the two-stage and the single-stage approach 
have the best chance of being detected and (2) the two-stage procedure is 
afforded the best chance to succeed. If this is not done, and if the 1985 
pretest shows inconclusive differences or net disadvantages for the 

two-stage approach, we believe that proponents of the two-stage procedure 
will be able to argue that more testing is needed. Given that the Census 
Bureau has relatively few pretest opportunities for each census, it is 
critical that the 1985 pretest be designed to provide results that will 
withstand close scrutiny. A two-stage mail census approach represents a 
significant departure from procedures of the last two censuses. The 1985 
pretest should provide results that will support an early decision on 
whether to drop the idea or to proceed with further testing. 

Recommendation 3«2 « Should the Census Bureau be committed to testing 
the two-stage concept in 1985, we recommend that careful attention be 
given to the experiment design to ensure that the method is given the 
best opportunity for demonstrating potential benefits. We recommend 
designing the pretest to maximize the ability to detect differences in 
response rates and coverage between the two-stage and the one-stage 
procedures. 



OTHER PRETEST RECOMMENDATIONS 

The panel has considered several other issues related to coverage 
improvement and questionnaire design that we believe deserve early testing. 
The panel's thinking has been directed toward such groups as Hispanics and 
young black males that have traditionally been hard to count and, in the 
case of Hispanics, are hard to identify reliably in the census. 
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A Specific Suggestion for Coverage Improvement 

In the 1977 pretest in Oakland, California, the Census Bureau tested the 
concept of "network" or "multiplicity* response rules for coverage 
evaluation (Sirken et alt, 1978). Sueh rules include asking parents to 
provide names and addresses of children and vice versa. Full results were 
never published for the Oakland study, but initial results suggested that 
the address information furnished was not of sufficient quality to warrant 
further investigation of this method as part of a coverage evaluation 
program that included matching of samples of persons to census records. 

However, the panel believes that the concept of generating lists of 
individuals in an area from the census operation itself to use as a 
procedure to improve coverage is worth exploring, at least for 
hard-to-enumerate areas. The procedure would be to ask respondents in the 
census for lists of relatives not living in the household. Information 
needed for nonresident relatives to facilitate locating them and determining 
if they had been included in the census would include address and also basic 
demographic characteristics, such as age and sex. 

The Oakland results suggested that address information supplied by 
parents was somewhat more accurate than information supplied by most other 
categories of relatives. Moreover, parents would probably be the most 
reliable source of information on a critical match item: birth date. 
Hence, asking parents to provide basic demographic information and addresses 
for children not living in the household could improve coverage, 
particularly of hard-to-count groups such as young black males in central 
cities. 

Recommendation 3.3 . We recommend, as one procedure for the first 
pretest to improve coverage of hard-to-count groups, that the Census 
Bureau add a question asking parents for names and addresses of children 
who are not part of the household. 

Specifically, we propose that a question similar to the following be 
added to the census form: 

Does anyone living in this household have a son or daughter living 

somewhere else? Yes Jfe If yes, please list sons 

and daughters below. 

Name , 

(Last First Middle) 

Sex _ Age Birth Date 

(Month - Day - Year) 

Address — 

(Number and Street City State ZIP) 



The object is to improve coverage in hard-to-count areas, and hence it 
would not be cost-effective or even feasible to follow up all children 
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reported aa not living in the household. Instead, the goal would be to 
examine census returns from areas identified as hard-to-enumerate and to 
follow up those children reported by their parents as living in the same 
area. The question suggested above is phrased to ask parents for the 
addresses of all children not living in the household, so that there is no 
opportunity for misinterpretation of which children should be listed. 

The answers to this question would provide a list of individuals that 
can be matched against the census. Presumably the list could be constructed 
and follow-ups (perhaps on a sample basis) of nonmatches done during the 
census operation. Operational questions for a test include the accuracy of 
birth date and address obtained from parents, the method of identifying 
addresses that are from hard-to-enumerate areas and should be followed up, 
the method of locating addresses, the use of different procedures in city 
and rural areas, and the method of sharing information in cities with 
multiple offices. The effects on response rates of asking this question 
also need to be examined. 



Questions on Race and Hispanic Origin 

For evaluation of the coverage of important race and ethnic groups of the 
population, such as blacks and Hispanics, as well as for analyses of the 
characteristics of these groups from census information, accurate 
identification of race and ethnicity on the census form is required. Over 
the decades, different categories of race and ethnic groups have been listed 
on census questionnaires in response to changing needs for the information. 
Editing rules for handling responses not falling into one of the designated 
categories have also changed. 

The 1980 census questionnaire included one question that identified 14 
separate race and national origin categories, such as white, black, American 
Indian, Filipino, Guamanian, plus an "other 11 category, plus a separate 
question on Hispanic origin. About 40 percent of the Hispanic population in 
1980 marked the "other 11 category for race instead of a category such as 
white or black (Passel et al., 1982). In 1980, in contrast to the practice 
in censuses from 1940 through 1970, the Census Bureau did not change 
Hispanic responses to white, but left them in the "other" category, thereby 
corresponding to the realities of individual perceptions of identity, but 
creating a discontinuity with statistics from prior censuses. 

The panel knows of no easy answer for reconciling the conflicting 
demands posed by: 

~ The need for continuity of time series 

— The need for consistency of census reports with other series, such 
as vital statistics. (Census and census-based population estimates 
provide denominators for vital rates. Vital statistics are also 
used to evaluate coverage via demographic methods. Rules for 
reporting and editing race and Hispanic origin are not currently 
consistent between vital statistics and census— see Bureau of the 
Census, 1983c; National Center for Health Statistics, 1982a, 1982b.) 

~ Changing perceptions of ethnic identification and the need to follow 
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societal preferences in question wording, given that census 
information is obtained from individual respondents. 

We do not presume to offer specific suggestions for the wording of race and 
Hispanic origin questions to improve the consistency or utility of the data, 
but have, some comments on methods for question design. 

Research on Questionnaire Design 

The Census Bureau does not have many opportunities to test important 
questionnaire changes, such as changes in the race and Hispanic origin 
questions, prior to a census. Moreover, it is expensive to mount full-scale 
questionnaire wording tests, as was done prior to 1970 and 1980 and is 
planned for 1990 in a national content test currently scheduled for 1986* 

The focus group technique has been successfully employed to design 
survey questions. This approach, originally developed in market research, 
involves in-depth discussions with small, usually homogeneous, groups. 
Focus groups offer the advantage of being able to probe for underlying 
meanings and hidden associations evoked by different question wording that 
may affect responses in unforeseen ways. This feature may be particularly 
useful for the testing of questions on race and ethnicity. 

As a case in point, prior to the 1980 census the Census Bureau conducted 
numerous tests of different wording of the question on Hispanic origin. The 
various pretests and dress rehearsals tried out variations of this question? 
as did the 1976 National Content Test. A number of serious response 
problems were encountered. For example, in almost every case in which a 
question had a category with the term "American," such as "Central or South 
American" or "Central or South Amer.( Spanish)," there was evidence that some 
non-Hispanic Americans checked these responses (Fernandez and McKenney, 
1980). The focus group technique would probably have provided evidence of 
this behavior and other response problems. 

The Census Bureau experimented with focus group techniques and other 
laboratory methods of questionnaire design prior to the 1980 census. 
Through focus group sessions and classroom experiments, the Census Bureau 
assessed the response effects of various aspects of questionnaire design, 
including the placement of instructions, the position of particular items in 
the questionnaire, requiring respondents to make machine-readable entries 
for date of birth, and the use of graphics. The Census Bureau also obtained 
reactions to specific questions (see Rothwelli 1983). With regard to race 
and Hispanic origin questions, the focus group sessions and classroom 
experiments examined effects on item nonresponse of placing the Hispanic 
origin question immediately following the race question versus separating 
the two items on the questionnaire. However, these experiments were limited 
in number and did not include sessions that focused explicitly on race and 
ethnicity questions. 

We believe that the use of focus groups for questionnaire development of 
sensitive items such as race and ethnicity would be very useful. 
Similarly, focus group techniques could reveal negative attitudes toward 
cooperation with the census among traditionally hard- to-enumer ate groups in 
the population and suggest ways of modifying these attitudes. 
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Recommendation We recommend that the Census Bureau use the 

technique of small focus group discussions as one means of 
questionnaire development in addition to other methods that it has 
traditionally employed. We also recommend that the Census Bureau 
use focus groups that include members of hard-to-count populations 
to help devise and assess means of reaching these groups in the 
census. 
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4. COVERAGE EVALUATION METHODOLOGIES FOR THE DECENNIAL CENSUS 



The panel investigated two basic types of deoennial oensus evaluation 
programs. The first type, labeled ooverage evaluation, is conoerned with 
measuring or assessing the completeness of coverage of the population 
count, on a national level and for various subgroups, often defined by 
subnational regions, as well as by sex, race, and age characteristics. 
This implies measuring overcount as well as under oount for these groups. 
The second type of evaluation program, labeled content evaluation, is 
concerned with measuring or assessing the completeness and accuracy of the 
responses to the various questions about characteristics of the 
population, on either the short form or the long form. Chapter 2, on the 
uses of sampling, considered some aspects of content evaluation. This 
chapter examines various methods of coverage evaluation. Yet a third type 
of evaluation program is concerned with assessing the efficiency with 
which census processes are carried out, i.e., quality control. The panel 
has not as yet addressed this important type of census evaluation. 

Before 1980, the evaluation programs implemented by the Bureau of the 
Census had two basic goals: (1) to provide users with an indication of 
the quality of the published data and (2) to provide guidance for the 
improvement of decennial census methodology. The quality of a data set 
can be assessed only in relation to its intended uses. Major uses of 
decennial census data include providing counts for reapportionment and 
redisricting and as factors in formulas underlying various federal 
programs of fund allocation. Therefore, any differential undercount of 
various subgroups or regions gives rise to questions of fairness for those 
subgroups or regions and the possible need for adjustments to reduce 
inequities due to a differential undercount. 

In the last few years, there has been extensive consideration of the 
possibility of using the results of coverage evaluation for the adjustment 
of the population counts. Prior to the 1980 census (see Wolter, 1983) • 
coverage evaluation programs concentrated on assessing the completeness of 
the count for population subgroups at the national level, whereas 
adjustment must be implemented at a much lower level of geographic 
aggregation. Hence, for adjustment purposes, coverage evaluation programs 
must encompass the question of whether the information obtained is 
adequate for the purpose of modifying population counts in subnational 
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geographic areas. The discussion in this chapter of coverage evaluation 
methodologies touches on issues related to adjustment. Chapter 5, which 
outlines issues related to adjustment, imputation, and estimation, 
approaches the same problem from another angle. The approach in Chapter 
5 is to consider what one does once the information from coverage 
evaluation is shown to be of use: for example, it discusses what methods 
are available for the modification of census oounts and how modified 
results are presented to users. The panel does not make recommendations 
on adjustment at this time, but Chapter 5 outlines issues that must be 
considered in any decision to modify census results. 

To aid the Census Bureau in improving its coverage evaluation 
programs, panel members reviewed the paper, "Research Plan On Adjustment 
for the 1990 Decennial Census" (Hogan, 1981). The Hogan paper describes 
many of the issues confronting the Census Bureau in its investigation of 
adjustment. It closes with appendices describing four studies that are 
expected to help direct the Census Bureau in its development of existing 
coverage evaluation strategies, with a view toward their use as programs 
of adjustment. Since preparing the March draft of the "Research Plan on 
Adjustment," the Census Bureau has modified particulars of the specific 
studies proposed for testing. The panel decided for the interim report to 
direct its comments to the latest written version of the plans. All these 
studies are either ongoing or scheduled to begin soon. A major focus of 
this interim report is to make recommendations about the carrying out of 

these study plans. 

There are currently four major methods available for evaluating the 

coverage of the decennial census: 

(1) Pre- or postenumeration surveys (PES), including such surveys as 
the postenumeration program (PEP) used in 1980 (Cowan and Bettin, 
1982); 

(2) Reverse record checks (Gosselin, 1980); 

(3) Demographic analyses (Siegel et al., 1977); and 

(4) The use of administrative records, which includes megalist* 
techniques (Ericksen and Kadane, 1983). 

A pre- or postenumeration survey is an evaluation program that uses a 
sample survey to independently re-enumerate the population. A dual-system 
estimate (see Bishop et al., 1975) may then be used to estimate the total 
population, based on estimates of the number of individuals counted in 
both the survey and the census, as well as those counted by. only one, 
often under the assumption of the independence of survey and census. 
Dua7 -system estimation is sometimes referred to by the term 
capture-recapture. 

A reverse record check is "an evaluation program in which a sample ot 
the population is drawn from a frame created prior to the census, traced 
forward to the time of the census, and matched to the census. The 
proportion of the sample which is unmatched provides an estimate of the 
proportion of the population which was missed in the census" (Childers and 
Hogan, 1983). Usually, the sample is a combination of the following four 
lists* (1) a sample from the previous census, (2) a sample of births in 
the intercensal period, (3) a sample of immigrants from the intercensal 
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period, and (4) a sample of people missed in the previous census as 
determined from the previous ooverage evaluation program. This teohnique 
has not been used extensively in the United States but is ourrently the 
main method used in Canada. 

The method of demographio analysis makes an independent estimate of 
the population using information on births, deaths, net migration, and 
other related information. This estimate is then oompared with the census 
count to estimate under cover age. 

Finally, administrative record strategies use one or more national or 
local rosters to develop lists to be matched to the census records to 
estimate net under cover age. The estimation technique used is often 
dual-system estimation. The various lists may be merged beforehand and 
matched to the census records, sampled from and sequentially matched to 
the census (see Ericksen and Kadane, 1983), or completely matched to the 
census individually. This approach has been used on a limited, 
experimental basis in the United States; its potential value as a major 
evaluation technique is the subject of considerable current attention and 
debate. 

The methods used in 1980 for coverage evaluation were primarily 
demographic analyses and a postenumeration survey. The Census Bureau's 
pretest plans for 1990 are designed to improve the existing methodologies 
and also to investigate the strengths and weaknesses of the possible 
alternatives. 

The remainder of this chapter is organized as follows. First, a 
section is devoted to assessments of the status of coverage evaluation, 
which provide the foundation for the recommendations that follow. 
Following this, the panel has a recommendation on completing current 
research on postenumeration programs. Next is a section describing the 
four studies outlined in Hogan (1984) and presenting the panel's 
recommendations on each study. Finally, the panel makes recommendations 
related to possible use of coverage evaluation for the modification of 
census results. 



THE CURRENT STATUS OF COVERAGE EVALUATION 

Assessment 4.1 , Each of the various methods currently used in the 
United States and other countries to measure the completeness of 
census coverage is subject to serious limitations, including biases, 
in measuring the coverage of various population groups. 

All of the four major types of coverage evaluation programs listed above 
are dependent, to a great extent, on one or more operations that have not 
been developed to a satisfactory degree * These operations include 
tracing, matching, and the counting or estimation of legal and illegal 
immigration and emigration. Tracing is the process whereby current 
information, including name and address, is acquired for individuals 
starting with information previously obtained, often from a previous 
census or survey. Matching is the determination of which individuals on 
two or more lists are actually the same individual. Most types of 
coverage evaluation programs are also affected by the unwillingness of 
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persons missed in the census to report in other surveys, sometimes 
referred to as oorrelation bias. 

Whether the potential bfases and inaccuracies of the various coverage 
evaluation programs for subnational areas are small enough to allow the 
results of the programs to be used for adjustment has been intensely 
debated in the last few years. However, these programs have extremely 
important uses apart from input to adjustment procedures, namely as 
measures of data quality (whioh includes their use as rough measures of 
undercoverage, both nationally and by major demographic groups) and as 
indicators of areas for improvement in census methodology. Consequently, 
coverage evaluation studies are important, even with their imp erfeot ions. 

Assessment 4.2 . There is at present no reason to expect a breakthrough 
in the methodology of coverage evaluation before 1990. However, some 
significant improvements are possible, expected, and important. 

As mentioned above, the most serious problems affecting the 
performance of ooverage evaluation programs are: (1) matohing, (2) 
tracing, (3) estimation of legal and illegal immigration and emigration, 
and (4) correlation bias. We have not seen any proposed new techniques 
that give any assurance thct these problems will be substantially resolved 
by 1990. For example, although there is work currently planned to improve 
matching and tracing (Childers and Hogan, 1984; Hogan, 1984), the Census 
Bureau has not described any new methodology that would lead to the 
expectation that the proposed experiments will provide methods greatly 
superior to those currently in use. Also, the work to date of the Panel 
on Immigration Statistics of the Committee on National Statistics 
indicates that, while immigration and emigration data can and should be 
improved, no currently available methods will accurately measure all legal 
and illegal movements across this country's borders. This data gap is 
central to the use of demography and other methods for coverage 
evaluation, and the panel strongly supports efforts to address this 
difficulty. 

Assessment 4.3 . There is, at this time, very little information on 
the quality of subnational estimates of coverage derived from any of 
the currently used evaluation programs. 

Subnational estimates of coverage are needed for use in adjusting 
population counts. Differential undercounts on a subnational basis may 
cause inequity in representation or fund allocation. The various 
nondemographio coverage evaluation programs currently provide stand-alone 
estimates of coverage for, at best, about 20-100 areas, due to the small 
sample sizes that can be processed in eaoh of these areas. No reliable 
methods currently exist of making subnational estimates of undercount by 
demographic analysis due to insufficient data on interstate migration and 
the subnational distribution of legal and illegal net immigration (Siegel 
et al., 1977). 

If it is decided to adjust the counts provided by the decennial census 
in 1990, estimates of coverage will be needed for quite small geographic 
levels, e.g., the 39,000 revenue-sharing districts. To do this, some 
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method of disaggregating the information on coverage to geographically 
loner levels is required. Among the methods that have been put forward > 
for accomplishing this are synthetic estimation (see Hill, 1980), more 
elaborate regression models, and log-linear models. 

In addition to the problems mentioned earlier, time constraints also 
limit the possibilities for using the subnational estimates derived from 
coverage evaluation programs for adjustment of subnational counts for some 
important purposes. None of the current evaluation programs, except that 
of demographic analysis, has been demonstrated to be capable of meeting 
the deadlines imposed by reapportionment and redisricting, which 
currently are, respectively, December 31, 1990, and April 1, 1991. 
Opinions differ as to whether alternative, nondemographic evaluation 
techniques making full use of future technology could meet these 
deadlines, possibly extended by a few months. (The pressure by the states 
for redisricting currently is for an earlier deadline.) Nevertheless, 
there are other important uses for subnational data that do not have such 
severe time constraints, especially their use in various fund allocation 
formulas. The possibilities of adjustment by various methods to satisfy 
these uses appear more feasible. 



THE COMPLETION OF CURRENT TESTING 

The Census Bureau has in progress a number of studies based on the 1980 
census that promise to provide a great deal of useful information 
pertaining to coverage evaluation and possible adjustment of future 
censuses. 

The Census/CPS/IRS Match Study provides a three-way match that is used 
to form population estimates. Estimates using this three-way match would 
have smaller variance and possibly smaller bias than estimates using the 
two-way match done in PEP. Also, estimates of correlation bias in the PEP 
would be provided (Miskura and Thompson, 1983). Other studies, e.g., the 
Demographic Analysis of National PEP Estimates, Local Area Estimation 
Research, and the Explanatory Analysis of PEP Data (Hogan, 1984), have 
direct implications for the feasibility of adjustment procedures. 

The panel urges that the above tests be completed and fully 
documented, because the results have potential implications with respect 
to the effective design of other field tests currently being planned. The 
panel has an overall concern that the history of tests completed by the 
Census Bureau has not always been available to help in the design and 
consideration of new tests. 

Recommendation 1 - We recommend that the Census Bureau assign a high 
priority to the completion and reporting of 1980-based tests related 
to coverage evaluation, especially the Census/CPS/IRS Match Study. 
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STUDIES OF COVERAGE EVALUATION STRATEGIES 



The 1985 Pretest of Post enumeration Survey Methodology 

The 1980 post enumeration program (PEP) was performed In an attempt to 
obtain information abput, among other things, subnational over- and 
undercounts. * The Census Bureau experienced a number of problems in 
conducting the 1980 PEP, and its planned pretest on post enumeration survey 
methodology (Hogan, 1984: Appendix A) has been designed to try to explore 
ways of overcoming some or all of these difficulties. 

As planned, the test will proceed as follows. A sample of 200 blocks 
in an area designated for a pretest census will be selected, completely 
relisted, and matched to the pretest census records. The matching will be 
a two-way computer match between the sample and the census listings. The 
two-way match (as opposed to a one-way match, which does not determine the 
matching status of each record on both lists) will enable the Census 
Bureau to estimate the overcount as well as the undercount. Nonmatches 
will be followed up using many different sources, e.g., telephone 
directories, the post office, local welfare rolls, etc., for tracing. 

The problem areas to be addressed by this pretest are: 

(1) Computer matching; 

(2) Balancing the undercount with the overcount; 

(3) Evaluating the overcount; 

(4) Nonresponse research; 

(5) Alternate questionnaire design; 

(6) Rules on whether the current or the listed resident should be 
enumerated; 

(7) Tho use of the PEP to benchmark other evaluation methods of 
interest; 

(8) Homogeneous domains and their effect on block sampling; and 

(9) Limited follow-up. 

A few of the above issues require some explanation. Balancing the 
undercount with the overcount refers to developing procedures that treat 
like components of the undercount and the overcount similarly. For 
example, the treatment of movers should be symmetric whether one is 
estimating the undercount or the overcount. Rules on whether the current 
or the listed resident should be enumerated in the PES refers to the 
problem of movers and whether new residents or the residents listed as 



1 The 1980 PEP was a special type of postenumeration survey (PES). In 
the PEP, records for persons interviewed in April and August 1980 for the 
Current Population Survey (Bureau of the Census, 1978a) were matched to 
the census records. In this section we use the terms PES and PEP 
interchangeably. In 1990, the successor survey to PEP may not even be 
taken after enumeration, and therefore may not be "postenumeration. w 
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present on Census Day are counted. The use of homogeneous domains (see 
Tukey, 1981) refers to stratification of the postenumeration survey sample 
by variables thought to be related to the undercount; such stratification 
is not necessarily confined to political boundaries, even county or state 
boundaries. 

Of the nine problem areas listed above, certainly some are unrelated 
to one another and therefore can be tested independently of the rest of 
the pretest. However, because many of the remaining factors do interact, 
the panel feels that the test may become confounded and lose its ability 
to inform as to the advantages or disadvantages of the remaining factors. 
There is also no indication that the Census Bureau has identified methods 
and criteria for the evaluation of the many components of this test. 
Furthermore, the likely sample size will be too small to identify the 
differences in alternative methods of estimating the net undercount, 
which, in total, is probably substantially less than 5 percent. 

The panel believes that priorities for the PES pretest should be based 
on an error profile of the PEP in 1980, and the most promising 
improvements should be investigated. Suitably modified, this pretest 
might yield useful information on methods for improving the PEP. Finally, 
the sample design has interest for some of the panel, as it may provide a 
convenient data set on which some adjustment procedures could be tested. 

Recommendation 4.2. We recommend that the Bureau of the Census narrow 
the scope of the pretest of the PES methodology. Ke believe that as 
planned it is too ambitious and is an inefficient use of scarce Census 
Bureau staff. We believe that a test limited to the most promising 
improvements would better serve the interests of the Census Bureau in 
determining the effectiveness of changes in PES methodology. 



Research Study on Hard-to-Count Groups 

Demographic analyses of past censuses have indicated a pattern of 
undercoverage such that certain groups, e.g., black men ages 18-40, appear 
to be missed more often than the general population. These same groups 
tend to be missed in independent surveys as well. Therefore, in order to 
ascertain the completeness of coverage for these groups through direct 
enumeration, it is necessary for the coverage evaluation program to make 
use of alternate methods of enumeration. In order to collect the needed 
information subnationally, demographic techniques are not feasible, as 
mentioned above. Two techniques that have been proposed for measuring 
underenumeration for these groups are megalist methods and reverse record 
checks. Megalist techniques, by using lists more fully representing 
members of these groups to match to the census, can enumerate people who 
are missed when survey techniques are used. Reverse record checks are 
based on the assumption that being missed in the census is strongly 
age-dependent, and 10 years ago or 10 years hence an individual may have 
been or may become easier to count. The pretest proposed by the Census 
Bureau on hard-to-count groups (Hogan, l984:Appendix B) will take a sample 
of 4,000 adult males in each of two studies, one testing a megalist and 
the other a reverse record check. A postenumeration survey will be run 
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simultaneously! with the idea that it may be used to augment either of the 
two procedures (since they will be used here to help count the particular 
population of men ages 18-40). 

In the megallst study, several record sources will be merged to create 
a megalist with which to search for people missed by the census pretest. 
The following sources may be used: 

(1) The 1983 Internal Revenue Service Individual Master File; 

(2) Unemployment records; 

(3) Immigration and Naturalization Service files; 

(4) Comprehensive Employment and Training Act (now Job Training 
Partnership Act) files; 

(5) Draft registration files; 

(6) Driver's license files; and 

(7) Other lists, e.g., police blotters or records of local hospital 
admissions. 

The merged list will have to be unduplicated. (For a possible method for 
merging these lists, see Kadane and Lehoczky, 1976.) If sampling from the 
lists is used, the problem of duplication will certainly be reduced 
significantly. In addition, to use dual-system estimation techniques, 
either the merged list will have to be representative of the specific 
population of interest, or the nonrepresentativeness of the merged list 
will have to be estimated. 

In the reverse record check study, a block sample of the 1980 census 
with maximum overlap with the pretest census area will be drawn. The 
census microfilm will then be scanned for records of males ages 13-35 and 
the information transcribed. Using the address register, 1980 addresses 
will be obtained. These people will then be traced and matched to the 
census pretest. 

At the conclusion of each half of this pretest, these two methods of 
enumerating hard-to-count groups will be compared with respect to overall 
costs, the number of people found that were missed in the pretest census, 
etc. The major objective is to determine if one or both procedures are 
feasible and also to assess the relative strengths and weaknesses of each 
procedure. 

The panel feels that the megalist half of this test has not been 
described in great enough detail. It is not explained how the difficulty 
in eliminating duplication in such massive lists will be resolved. It is 
unclear how it will be determined that the final megalist is 
"representative." In addition, many of the lists proposed for use (e.g., 
police blotters and unemployment records) have been tried previously with 
poor results (see Bureau of the Census, 1976s 2-8). 

The planned reverse record check does not mirror the performance of a 
reverse record check in the decennial census, primarily because there is 
no accounting for groups missed in the previous census. Finally, it is 
unclear how one would assess the validity of the results. 
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Recommendation We recommend that the Bureau of the Censua not 

proceed with the propoaed pretest on hard-to-count groups, unleaa more 
clear-cut goala and procedurea can be developed. However, nonfield 
test research on multilist or compoaite liat methods ahould continue. 

For example, research is needed on the relative advantages of various 
alternative approaches to the uae of administrative lists for the purpoae 
of increasing coverage or coverage evaluation. One approach ia to merge 
the lists, or samples from the lists, and match to the census. Another 
approach ia to separately match all the Hats, or samples from the lists, 
to the oensus, and to each other. Finally, a third approach is to 
sequentially match the liats, or samples from the liata, pairwiae to the 
census, as described in Ericksen and Kadane (1983). Theae possibilities 
and other aspects, of megalist methodologies need to be examined, although 
not necessarily through field tests. The Cenaua Bureau ia already 
investigating one crucial aspect of megalist methodology in the current 
research on matching (see Childers and Hogan, 1984). 



The Forward Trace Study 

The Forward Trace Study (Hogan, 1984: Appendix C) is designed to test 
various methods for tracking people from their 1980 census addresses to 
their current addresses. The purpose is to determine which tracing method 
would be most cost-effective to use in any reverse record check planned 
for the 1990 decennial census. 

The Forward Trace Study began in October 1981 by taking a sample from 
the 1980 census supplemented by a sample of missed persons derived from 
the PEP. Two other supplemental parts of the sample to be added later are 
subsamples of births and immigrants. The approximate sample sizes for the 
four sub-samples are: 

(1) 1980 census 11,900 

(2) People missed 4,000 

(3) Immigrants 2,700 

(4) Births 2,700 

Three different tracing methods are being examined: (1) periodic 
tracing with periodic personal contact, (2) periodic tracing with initial 
personal contact, and (3) tracing only at the end of the period. At the 
end of the period, an independent household interview will be conducted at 
the traced addresses to estimate within-household misses and certain types 
of whole-household misses. The three different tracing procedures will be 
compared for cost and completeness, especially for hard-to-enumerate 
groups. One concern is that the people subject to the more intensive 
tracing procedures may become sensitized to the census, and therefore may 
be enumerated with greater frequency than the general population. This 
reverse correlation bias would make it difficult to use dual-system 
estimation, which often makes use of the independence of the sample and 
census. 
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The suooess of the reverse reoord oheck in Canada has suggested the 
use of a similar prooedure in the United States. However, there are major 
differences that may reduce the effioaoy of this methodology in the United 
States. Some of theso are cultural differences in the populations, 
differences in immigration and emigration rates, both legal and illegal, 
and the time lag between censuses , whioh is every 10 years in the United 
States oompared with every 5 in Canada. The Forward Trace Study 
principally addresses the difference in frequency of the American and 
Canadian censuses. 

The panel feels that the Forward Trace Study is well thought out and 
likely to yield significant information as to the feasibilty of using a 
reverse record check to evaluate the completeness of coverage of the 1990 
decennial census. 

Recommendation k*k. We recommend that the Bureau of the Census 
proceed with the Forward Trace Pretest as planned, because it should 
yield valuable information. 



The Reverse Record Check Pretest 

The fundamental idea to be tested in the proposed reverse record check 
pretest (Hogan, 1984: Appendix D) is the possibility of using a 
pre-enumeration survey to match to the census in order to measure 
coverage. One of the difficulties with a standard reverse record check is 
the length of the intercensal period, which makes tracing difficult. 
Presumably, if the tracing is attempted closer to the time of the census, 
fewer difficulties in tracing would be experienced. However, with a 
pre-enumeration survey, one cannot create as complete a sample as is 
possible with a true reverse record check, in which one of the components 
of the created sample is a representation of people missed in the previous 
census. 

There are two parts to this pretest. After a pretest census area is 
identified, a sample for the reverse record check test would be taken from 
the 1980 census. Household clusters would be assigned randomly to the two 
parts. Stratification based on minority percentage and other variables 
related to undercount would be used to ensure balance. For Part A, 1980 
census questionnaires would be looked up. For Part B, a house-to-house 
pre-enumeration survey would be conducted. Tracing for both samples would 
begin immediately after the interviewing for Part B was completed. 

One year after both processes are finished, the two lists would be 
matched to the 1990 census pretest. , Unmatched people would be followed 
up. The total sample would include approximately 6,000 people: 3*000 for 
the pre-enumeration survey-based reverse record check and 3,000 for the 
1980 census-based reverse record oheck. 

The panel feels that Part A is virtually identical to tests included 
in the Forward Trace Study. Part B has some weaknesses of design that 
should cause it to have a much lower priority with respect to other needed 
testing. The one-year separation between the first contact for the PES and 
the time of the census pretest is short, given that in the decennial 
census the separation would be likely to be two years or longer (Hogan, 
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1984). The deficiency of the ooverage achievable in this pretest compared 
with a true reverse record check is a matter of concern. That is, how 
could one include some representation of people not counted in the 
pre-enumeration survey? This representation underlies much of the benefit 
of the reverse record check methodology. The Census Bureau f s research 
plan notes that the major advantage of this test is the identification of 
troublesome subgroups for the reverse record check. However, the panel 
believes that the Forward Trace Study should be able to provide much of 
this information. 

Reoommendatlon We recommend that the Bureau of the Census not 

proceed with the proposed reverse record check pretest, since it will 
add little if any information to what the Forward Trace Study will 
provide. 



OTHER RECOMMENDATIONS RELATED TO COVERAGE EVALUATION 

The panel has considered some of the issues that need to be addressed if 
the results of coverage evaluation programs are to be useful in modifying 
census population counts. The panel has two recommendations in this area, 
one related to research on the feasibility of developing coverage 
estimates for small geographic areas and the other related to research on 
the feasibility of developing timely coverage estimates. 

Recommendation H.6. We recommend that the Bureau of the Census 
perform research as soon as possible on the feasibility of the 
development of models for subnational estimates of under- and 
overcoverage. 

This recommendation has several aspects • First, we suggest starting 
with the national age-race-sex undercount estimates derived from 
demographic analysis for 1980 and deriving from them, through synthetic 
and related means, state-level estimates. (For the purposes of this 
discussion, we use the term "synthetic estimate" to indicate any procedure 
that estimates undercoverage for demographic groups in small areas by 
carrying down the estimates derived for these groups for larger areas.) 
Comparison of the synthetic estimates with the "direct" PEP-derived 
undercount estimates for states should then be made to see whether the 
results shed light on the feasibility of using synthetic estimates based 
on national demographic estimates of the undercount to produce state and 
substate undercount estimates. 

The following approach should be explored as well. First, the United 
States should be divided into two (or three) blockings of about 2Q-60 
relatively homogeneous and not necessarily contiguous domains (Tukey, 
1981). Then, using the first blocking, a regression model should be 
estimated, using from three to six covariates, which fits the PEP 
undercount estimates derived for the domains. The same should also be 
carried out for the second blocking (and perhaps the third), attempting to 
use a different set of covariates. Estimates for substate regions would 
make use of synthetic techniques based on the regression estimates for the 
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homogeneous domains. Then the underoount estimates for the two (or three) 
models should be compared in a variety of ways. It would be interesting 
to see whether the substate regression estimates summed to the state- level 
PEP estimates. The effeot of these estimates on redisricting or 
reapportionment oould also be examined. The difficulty with this approach 
is that there are no "true" values. Nevertheless, this type of 
investigation would provide some clues on model robustness. 

A third possibility was discussed but not uniformly supported by the 
panel. Several areas could be specially chosen for a census pretest that 
represent a wide diversity of values for some oovariates that are 
considered strongly related to the underoount. Coverage estimates could 
then be made for these areas, presumably by using a variety of ooverage 
evaluation techniques, including the PEP augmented by the use of 
administrative lists. A regression model using the above oovariates would 
then be fitted to the PEP estimates for the majority of these regions, 
setting aside a validating sample. This procedure oould assist in 
assessing the feasibility of a modeled underoount adjustment. At the same 
time, demographioally based synthetic estimates could be developed and 
compared with the regression estimates. A major problem with this 
approach is that the census pretest would be carried out, by neoessity, 
for a relatively limited region, and the results could not be generalized 
for use across the entire United States. There are other serious problems 
related to the generation of PEP-type estimates for small clusters. 

The panel makes one additional recommendation that relates to the 
previously discussed concern regarding the time limitation inherent in any 
adjustment program. The basic idea of the recommended research is to 
examine the possibility of: (1) the use of earlier months (e.g., December 
1989 and April 1990), instead of April and August, as the survey months 
for the PEP or (2) the fast matching of people "forward traced 11 prior to 
the census. 

It is important to clarify the distinction between the research the 
panel is calling for here, which includes the possibility of a 
pre-enumeration survey, and the recommendation for the cancellation of the 
reverse record check pretest, which includes a test of a pre-enumeration 
survey. The panel is in favor of the testing of a pre-enumeration survey 
taken as close to the census as possible, possibly between one and four 
months prior to the census, to consider as the basis for coverage 
estimates derived using PES methodology. However, as the time period 
between survey and census lengthens, dynamic factors such as population 
growth and redistribution of the population due to migration cause 
pre-enumeration surveys taken much sooner to be less worthwhile for 
purposes of coverage evaluation using PES methodology. Thus, the one- or 
two-year separation between the survey and the census under test in the 
reverse record oheck pretest is not recommended by the panel, as stated in 
Recommendation 4.5. Similarly, the panel is in favor of testing 
procedures to expedite the completion of reverse record checks, especially 
for the fast matching of people, as stated in Recommendation 4.4. 
However, the loss of the representation of people missed in the previous 
census resulting from the use of a pre-enumeration survey taken one or two 
years before the census is too large to justify the gain of a higher 
percentage of people successfully traced. 
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Recommendation it. 7. We recommend that the Bureau of the Census 
explore the logistioal problems , through a field test if neoessary, 
involved in conducting a PEP, or a type of reverse reoord oheok, that 
could supply subnational estimates of coverage by December 31 of the 
census year or April 1 of the following year. 
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5. ISSUES IN ADJUSTMENT, IMPUTATION, AND ESTIMATION 



The panel directed part of its attention to a multifaceted topio 
encompassing the issues of estimation, imputation, and adjustment. These 
topics are related • Imputation makes use of a model (often only implied) 
and is therefore a form of estimation. Any adjustment methodology would 
also need to make use of statistical estimation in order to oombine census 
data and the information from ooverage evaluation programs. The oentral 
idea identified by the terms adjustment, estimation, and imputation is the 
modification or enhancement of the responses elicited in the oensus. 

Currently, the Census Bureau adjusts the responses obtained from the 
census primarily in three ways: (1) hotdeck imputation, in which 
responses randomly selected from similar respondents are used to fill in 
missing information on incomplete questionnaires, (2) iterative 
proportional fitting, described below, to weight the sample so that 
selected aggregates of long-form responses agree with corresponding 
aggregates of short-form responses, and (3) the imputation of whole 
persons on a random basis for housing units believed to be occupied but 
for which there is no information about the occupants. These three types 
of modification are consistent with the point of view held by the Census 
Bureau that the information produced be internally consistent, a term 
defined below. 

More generally, adjustment and/or imputation can take on a variety of 
forms due to the multiplicity of purposes for which the census (or 
modified census) data are used. These purposes are as diverse as 
providing population counts for apportionment and redisricting, 
information as inputs to various fund allocation programs, data for market 
research and local planning, and data on small groups and small areas for 
various other needs. Thus, the issues of adjustment and imputation have 
many sides, and their disoussion can take many forms. 

The above factors introduced a degree of complexity into our 
discussions of the issues surrounding adjustment and imputation. The 
panel decided to use this interim report to detail research areas and 
issues for further investigation, with the expectation of arriving at 
recommendations and theoretical or procedural advanoes in the final 
report. In the remainder of this chapter, we provide a framework for 
further research on the topics of adjustment and imputation of census 
data. 

43 
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MAJOR ISSUES 

The panel identified four issues as central to deliberations on the topics 
of adjustment and imputation: (1) the consistency of the information 
produced by the Census Bureau, (2) the use of ancillary data sets, 

(3) approaches to statistical estimation that should be considered, and 

(4) operational constraints • In what follows we discuss each of these 
issues and their relationship to adjustment and imputation with respect to 
decennial census data. 



Internal Consistency 

Internal consistency refers to the idea that estimates released to the 
public should, to the maximum extent possible, satisfy various 
relationships that would be evidenced were all census questionnaires 
complete and accurate* For example, one of the ways in which census data 
are released to the public is in the form of cross-tabulations or 
contingency tables. If there were no nonresponse to the census 
questionnaires, the elements of every table would add to the totals of 
that table. This is an example of internal consistency. 

There exists a continuum on which the position of complete internal 
consistency represents one extreme. This extreme almost certainly implies 
in practice that any methodology that adjusts for nonresponse take the 
form of some kind of imputation— that is, construction of additional 
pseudorespondents to be added to the raw census data file and replacement 
of each and every respondent's missing data items with imputed values. 
Complete internal consistency at least implies that a consistent 
imputation must be theoretically possible. 

Advocates of the other extreme view, which could be called "handling 
each possibility for adjustment as a separate issue," would argue for 
gross adjustment and estimation procedures, rather than individual 
pseudorespondent imputation. This argument proceeds from tjie assertion 
that the population counts are merely numbers, not people, and as such are 
amenable to any appropriate mathematical process. Thus, in the example 
given above on contingency tables, the elements of a table need not add to 
its totals. One issue to be explored is under what conditions there are 
advantages to departing from internal consistency. 



Ancillary Data 

Ancillary data are data collected in other programs that provide 
information similar to the information asked for on the decennial census 
questionnaires. Such data include information collected from coverage and 
content evaluation programs and also information collected by other 
government agencies, e.g., the Internal Revenue Service. 

The panel considers an important area for investigation to be the 
listing and examination of ancillary data sets available to the Census 
Bureau. This examination might include investigations into their quality, 
costs of use, any legal and social constraints on their use such as 
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confidentiality, and how these sources of data might be Msed to augment 
census data. In addition, we consider the design of new ancillary data 
sources for the modification of census data to be within the purview of 
the panel. 

The use of ancillary information introduces an additional concept of 
consistency-- that of consistency or near consistency with respect to 
values developed from the use of data collected in other programs. For 
example, data from coverage evaluation programs can give additional 
information as to the values for the totals of a contingency table. If 
the values derived from census responses for these totals differ greatly, 
from the values derived from the coverage evaluation program, it may be 
said that a type of consistency has been violated. Certainly, this is not 
the same type of consistency as mentioned before, since different data 
collection schemes will have different universes, data definitions, 
reference periods, etc., that would preclude insisting on any strict form 
of consistency. 



Approaches to Statistical Estimation 

As a result of an inquiry into the value of consistency, it may become 
apparent that some type of estimation model, different from that implicit 
in (hotdeck) imputation, is desirable. At this point, four different 
approaches to a solution of this problem are believed to have promise: 
(1) iterative proportional fitting, (2) model-based estimation, (3) 
multiple imputation procedures, and (4) a hierarchical Bayesian approach. 

Iterative proportional fitting (Bishop et al.,1975) is a method for 
forcing the elements of a contingency table to add to the row and column 
totals. Thus, if a total provided by a coverage evaluation program is 
believed to be extremely accurate, all lower-level counts from the census 
could be modified so that they add to it. 

An example of model-based estimation (Cassel et al., 1983) models 
nonresponse with linear models. Variables are selected that are 
considered to be linearly related to the nonresponse mechanism. For 
example, a model of this type would be able to assign a probability of 
nonresponse to groups of respondents. Then, the responses are reweighted 
to compensate for the missing information. These models are non-Bayesian, 
that is, they are derived from observed frequencies. 

Multiple imputation procedures (Rubin, 1978) attempt to avoid the 
possibility of a nonrepresentative imputation by repeating the imputation, 
at the same time providing an estimate of the variance due to imputation. 
One major advantage of multiple imputation is that it can be accomplished 
with minor changes to software that is currently used for one-time 
imputation. 

Finally, the hierarchical Bayesian approach (Ericksen and Kadane, 
1983) provides a framework for optimally combining estimates with unknown 
but estimable precision. This approach would allow one to combine row and 
column totals produced for a contingency table from census data, with row 
and column totals derived for the contingency table from ancillary data, 
given that one could estimate the relative precision of the two totals. 
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The panel intends to compare these approaches in terms of their ease 
of implementation, considering cost and time constraints, as *ell as in 
terms of the validity of the assumptions that underly each approach. A 
critical issue to be addressed is that of the appropriate criterion or 
figure-of-merit to be used in assessing the goodness of the adjustment 
procedure. Kadane (1983) in adducing loss functions for each of the 
various uses of the census, and the recommendations of Tukey (1983) on 
yardsticks of imperfection, provide some directions for research. 



Operational Constraints 

The important factors of time, cost, and staff requirements for the 
various approaches, as mentioned above, will be considered in assessing 
the strengths and weaknesses of the various approaches to adjustment and 
imputation. For example, many current procedures, including imputation, 
are conducted under the operational constraint that the computation 
require no more than a few passes through the census data file. In 
addition, current estimation procedures are often interpretable as simply 
a reweighting of existing records. This interpretation facilitates the 
implementation of these forms of estimation. New developments that do not 
satisfy these operational constraints may have increased costs that may 
argue against their use. 

The issues of privacy and confidentiality present legal and 
philosophical constraints on adjustment and imputation, especially in the 
release of small-area data. However, for small areas it is convenient to 
create the desired cross-tabulations from data files of individuals rather 
than from estimates based on available, and less sensitive, higher-level 
cross-tabulations. In this area the notion of "error inoculation, w that 
is, the introduction of random noise to avoid disclosing information, is 
under examination. 



ADDITIONAL AREAS IN NEED OF RESEARCH 

Three related issues, which may not be among those normally suggested by 
the terms adjustment, imputation, and estimation, are issues the panel 
plans to examine. 

The first issue for investigation is the development of a manual for 
census data users, with references to data items and sources that can be 
used to check and validate the user f s analysis of the data and/or possibly 
adjust the published census data. This manual could be an extension of 
information currently provided in Census Bureau user guides and file 
documentation (Bureau of the Census, 1982b; 1983c). 

The second issue for research is the estimation of means and totals 
for characteristics appearing only on the long- form— that is, not 
collected on a 100 percent basis. This causes some small areas to have 
estimates with high sampling variablity. There exist methods to reduce 
this variability by the use of information for more aggregated regions, in 
effect borrowing some stability at the cost of increases in bias. This 
issue is separate from those of adjustment and imputation; it involves the 
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applicability of regression estimates and empirical Bayesian methodology 
(Fay and Herriotj, 1979). 

The third issue identified In need of research is a reconsideration of 
the mode of presentation of census data. One alternative is the mode used 
in the Australian census (Doyle, 1980), in which adjusted population 
totals to be used for purposes of political representation are published 
by state, but data on characteristics are not altered. If adjusted 
figures are provided for selected uses, information could be provided to 
the user that would indicate how adjustment could be made for other uses 
of census data* This issue encompasses the presentation of census data in 
the form of public use miorodata files as well as tables. The problem of 
the appropriate form of public use miorodata files has two dimensions. 
The first concerns the benefit of providing raw data only versus providing 
raw and Imputed data. (The Census Bureau ourrently flags which data are 
imputed and which are not,) The second dimension oonoerns the benefit of 
providing data only versus providing data and procedures or programs for 
aggregation, tabulation, and/or adjustment. Finally, we are aware that 
the constraints Imposed by the requirements of privacy and confidentiality 
are a critical factor to consider with regard to publio use files. 
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